Occam's Two Razors: the Sharp and the Blunt Occam's Two Razors Theoretical Arguments for the Second Razor the Pac-learning Argument

نویسنده

  • Pedro Domingos
چکیده

Occam's razor has been the subject of much controversy. This paper argues that this is partly because it has been interpreted in two quite different ways, the rst of which (simplicity is a goal in itself) is essentially correct, while the second (simplicity leads to greater accuracy) is not. The paper reviews the large variety of theoretical arguments and empirical evidence for and against the \second razor," and concludes that the balance is strongly against it. In particular, it builds on the case of (Schaaer, 1993) and (Webb, 1996) by considering additional theoretical arguments and recent empirical evidence that the second razor fails in most domains. A version of the rst razor more appropriate to KDD is proposed, and we argue that continuing to apply the second razor risks causing signiicant opportunities to be missed. William of Occam's famous razor states that \Nun-quam ponenda est pluralitas sin necesitate," which, approximately translated, means \Entities should not be multiplied beyond necessity" (Tornay 1938). It was born in the late Middle Ages as a criticism of scholastic philosophy, whose theories grew ever more elaborate without any corresponding improvement in predictive power. In the intervening centuries it has come to be seen as one of the fundamental tenets of modern science , and today it is often invoked by learning theorists and KDD practitioners as a justiication for preferring simpler models over more complex ones. However, formulating Occam's razor in KDD terms it trickier than might appear at rst. Leaving aside for the moment the question of how to measure simplicity, let generalization error of a model be its error rate on unseen examples, and training-set error be its error on the examples it was learned from. Then the formulation that is perhaps closest to Occam's original intent is: First razor: Given two models with the same generalization error, the simpler one should be preferred because simplicity is desirable in itself. On the other hand, within KDD Occam's razor is often used in a quite diierent sense, that can be stated as: Second razor: Given two models with the same training-set error, the simpler one should be preferred because it is likely to have lower generalization error. We believe that it is important to distinguish clearly between these two versions of Occam's razor. The rst one is largely uncontroversial, while the second one, taken literally, is false. Several theoretical arguments and pieces of empirical …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Occam's Two Razors: The Sharp and the Blunt

Occam’s razor has been the subject of much controversy. This paper argues that this is partly because it has been interpreted in two quite different ways, the first of which (simplicity is a goal in itself) is essentially correct, while the second (simplicity leads to greater accuracy) is not. The paper reviews the large variety of theoretical arguments and empirical evidence for and against th...

متن کامل

Generalized Graph Colorability and Compressibility of Boolean Formulae

In this paper, we study the possibility of Occam’s razors for a widely studied class of Boolean Formulae : Disjunctive Normal Forms (DNF). An Occam’s razor is an algorithm which compresses the knowledge of observations (examples) in small formulae. We prove that approximating the minimally consistent DNF formula, and a generalization of graph colorability, is very hard. Our proof technique is s...

متن کامل

Extending Occam's Razor

Occam's Razor states that, all other things being equal, the simpler of two possible hypotheses is to be preferred. A quanti ed version of Occam's Razor has been proven for the PAC model of learning, giving sample-complexity bounds for learning using what Blumer et al. call an Occam algorithm [1]. We prove an analog of this result for Haussler's more general learning model, which encompasses le...

متن کامل

Computational Learning Theory Fall Semester , 2010 Lecture 3 : October 31

In this lecture we will talk about the PAC model. The PAC learning model is one of the important and famous learning model. PAC stands for Probably Approximately Correct, our goal is to learn a hypothesis from a hypothesis class such that in high con dence we will have a small error rate (approximately correct). We start the lecture with an intuitive example to explain the idea behind the PAC m...

متن کامل

L Partial Occam's Razor and Its Applications Partial Occam's Razor and Its Applications

We introduce the notion of \partial Occam algorithm". A partial Occam algorithm produces a succinct hypothesis that is partially consistent with given examples, where the proportion of consistent examples is a bit more than half. By using this new notion, we propose one approach for obtaining a PAC learning algorithm. First, as shown in this paper, a partial Occam algorithm is equivalent to a w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998